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The  concept  Mtest  falmess>'  has  developed,  only  recently.  A  major  impetus  In 
the  development  and  application  of  the  concept  ha s  come  from  the  publication 
o f  the  Uniform  Guideline!)  on  Employee  Selection  Procedures  (l/GES)  In  1978. 

The  l/GES  axe  interpreted  as  mandating  the  use  of  a  regression  model  In 
evaluating  test  fairness.  A  technique  ioas  developed  utilizing  a  regression 
model  to  evaluate  the  fairness  the  Flight  Aptitude  Selection  Test  (FAST) 
for  the  groups  Identified  by  the  l/GES:  Blacks,  American  Indians,  Aslans, 

His  panics,  Caucasians  and  females .  The  regression  of  FAST  scores  on  overall 
grades  In  the  Initial  Entry  Rotary  Wing  (IERW)  course  ms  performed  for  each 
of  the  above  groups  In  comparison  uiith  the  majority  group.  Available  popula¬ 
tion  sizes  toere  considered  too  small  to  permit  a  conclusive  fairness  evalua¬ 
tion  at  this  time.  The  fairness  evaluation  Mill  be  repeated  semiannually 
until  minority  population  sizes  permit  sufficient  poioer  to  perform  a  definitive 
analysis,  yn 

AN  EVALUATION  OF  THE  FAIRNESS  OF  THE  FLIGHT  APTITUDE  SELECTION  TEST  (FAST) 

"Fairness"  as  a  criterion  for  the  evaluation  of  a  test  or  other  selection 
procedure  is  a  relatively  new  concept.  The  concept  has  evolved  from  the  tech¬ 
nology  of  test  validation  to  answer  the  question,  "Is  this  test/procedure  valid 
for  the  selection  of  minority  as  well  as  majority  applicants?"  Appropriate 
methodology  for  the  evaluation  of  fairness  is  currently  a  matter  for  debate  in 
the  technical  literature  (Ledvinka,  1979).  A  major  impetus  for  the  development 
of  fairness  methodologies  was  the  publication  of  Guidelines  on  Employee  Selec¬ 
tion  Procedures  in  1970  by  the.  Equal  Employment  Opportunity  Commission.  In 
fact,  the  most  current  version,  the  Uniform  Guidelines  on  Employee  Selection 
(UGES)  (1978),  noted  that,  "The  concept  of  fairness  or  unfairness  of  selec¬ 
tion  procedures  is  a  developing  concept,  (14B(8))."  Since  this  technology 
is  still  developmental,  this  paper  will  review  the  rationale  and  precedence 
for  the  FAST  fairness  evaluation  in  some  detail. 

Technical  standards  for  performing  a  fairness  evaluation  are  addressed  by 
both  professional  and  government  agencies.  The  American  Psychological  Associa¬ 
tion  (APA)  publication,  Principles  for  the  Validation  and  Use  of  Personnel  Se¬ 
lection  Procedures  (1975),  discusses  both  technical  and  ethical  implications  of 
the  choice  of  methodology  in  fairness  research  designs.  The  government  publi¬ 
cation  referenced  above,  Uniform  Guidelines  on  Employee  Selection  Procedures 
(UGES)  published  in  1978,  which  is  a  codified  position  agreed  upon  by  the  US 
Civil  Service  Commission,  the  Department  of  Justice,  the  EEOC,  and  the  Depart¬ 
ment  of  Labnr  falls  under  the  scope  of  Title  VII  of  the  1964  Civil  Rights  Act 
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and,  for  that  reason,  carries  the  impact  of  law.  Furthermore,  the  current 
version  of  the  UGES  was  reviewed  by  the  APA  prior  to  publication,  thus,  it  is 
a  synthesis  of  professional  and  governmental  guidance  in  the  technical  and 
ethnical  and  legal  aspects  of  fairness  research  designs.  For  these  reasons, 
this  paper  will  make  frequent  reference  to  the  UGES. 

The  UGES  define  fairness  by  stating  its  obverse:  "When  members  of  one 
race,  sex,  or  ethnic  group  characteristically  obtain  lower  scores  on  a  selec¬ 
tion  procedure  than  members  of  another  group,  and  the  differences  in  scores 
are  not  reflected  in  differences  in  a  measure  of  job  performance,  use  of  the 
selection  procedure  may  unfairly  deny  opportunities  to  members  of  the  group 
that  obtains  the  lower  scores  (Section  14B8a)."  This  definition  has  clear 
implications  in  the  design  of  a  fairness  research  study  in  that  it  specifies 
that  fairness  should  be  defined  in  terms  of  the  bivariate  distribution  of 
test  (or  other  selection  procedure)  scores  and  job  performance  scores.  Specif¬ 
ically,  fairness  is  demonstrated  by  coincident  regression  of  job  performance 
scores  on  test  scores  for  a  minority  group  and  the  majority  group.  Fairness 
does  not  require  that  minority  performance  on  the  test,  or  on  the  job  be  equal 
to  majority  performance  but  only  that  the  test  (or  selection  procedure)  does 
not  over  or  under  predict  minority  performance  vis  a  vis  majority  performance. 

The  UGES  do  not  require  routine  demonstration  of  the  fairness  of  a  selec¬ 
tion  procedure  for  every  minority  group  identified  in  section  4B  (Blacks, 
American  Indians,  Asians,  Hispanic  and  Caucasians).  Section  14B(8)(b)  states: 
"Where  a  selection  procedure  results  in  an  adverse  impact  on  a  race,  sex,  or 
ethnic  group  identified  in  accordance  with  the  classifications  set  forth  in 
section  4  above  and  that  group  is  a  significant  factor  in  the  relevant  labor 
market,  the  user  generally  should  investigate  the  possible  existence  of  un¬ 
fairness  for  that  group  if  it  is  technically  feasible  to  do  so."  In  other 
words,  a  demonstration  of  fairness  is  required  only  where: 

(1)  there  is  evidence  of  adverse  impact  as  defined  in  section  4D  of  the 
UGES; 

(2)  that  adverse  impact  affects  a  group  identified  in  section  4B  of  the 
UGES; 

(3)  the  group (s)  affected  comprise  a  significant  factor  in  the  relevant 
labor  market  which  is  defined  in  section  15A(l)(c)  as  constituting  more  than 
27.  of  the  labor  force  in  a  "relevant  labor  area"; 

(4)  it  is  "technically  feasible"  to  investigate  the  fairness  issue. 
Technical  feasibility  is  defined  in  section  14B(8)(c)  to  include: 

(a)  sufficient  sample  sizes  to  achieve  statistical  significance; 

(b)  direct  comparability  of  the  samples  in  terms  of  the  actual  jobs 
performed. 

^At  this  writing,  military  personnel  in  DOD  agencies  do  not  fall  under  the 
purview  of  Title  VII,  thus,  may  not  be  legally  bound  to  the  UGES.  However, 
the  author  takes  the  position  that  the  UGES  represent  current  professional 
thinking  in  this  technical  area,  therefore,  they  provide  appropriate  guidance 
independent  of  their  status  as  law. 
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The  issues  raised  in  paragraphs  1-3  above  are  empirical  questions.  They 
are  best  answered  by  descriptive  data  pertaining  to  the  population  of  applicants 
to  US  Army  flight  training.  The  Fort  Rucker  Field  Unit  of  ARI  began  an  investi¬ 
gation  of  the  selection  rates  of  applicants  of  the  groups  identified  in  section 
4B  of  the  UGES.  Data  were  requested  from  MILPERCEN  and  RCPAC  and  a  quality 
check  was  performed  on  the  data  obtained  from  the  master  files.  Master  file 
data  were  cross  referenced  with  data  in  the  student  pilot's  flight  folders  at 
the  Directorate  of  Training  at  Fort  Rucker.  Taking  the  black  group  as  an 
example,  master  file  data  were  missing  for  over  78%  of  the  trainees,  i.e.,  78% 
of  individuals  who  had  entered  the  flight  training  course  did  not  appear  in  the 
master  file.  Therefore,  it  must  be  concluded  that  the  selection  rates  prior 
to  1980  are  indeterminate  and  adverse  impact  cannot  be  assessed. 

With  the  advent  of  the  revised  FAST  test  (RFAST)  which  replaced  the  earlier 
f^rm  in  the  field  in  early  1980,  the  data  collection  problem  referenced  above 
has  been  alleviated.  The  RFAST  answer  sheet  requests  information  on  the  sex 
and  ethnic  status  of  applicants.  All  RFAST  answer  sheets  are  sent  to  ARI, 

Fort  Rucker  for  machine  scoring  and  storage  in  the  RFAST  archives,  thus,  all 
the  information  needed  to  determine  whether  or  not  adverse  impact  exists  will 
be  available  at  ARI  Fort  Rucker.  Given  that  it  commonly  takes  more  than  one 
year  between  taking  the  RFAST  and  graduation  from  the  34  week  training  program, 
it  will  be  some  time  before  adverse  Impact  can  be  determined  for  the  RFAST. 

In  the  interim,  the  conservative  assumptions  will  be  made  that  adverse 
impact  does  exist  for  all  the  groups  identified  by  Section  4B  of  thr  UGES, 
and  that  each  of  those  groups  constitutes  more  than  2%  of  the  applicant  popu¬ 
lation.  Pursuant  to  Section  14B(8)  of  the  UGES,  a  fairness  evaluation  will 
be  undertaken  for  each  group  where  it  is  "technically  feasible"  to  do  so. 

However,  the  issue  of  technical  feasibility  is,  like  the  issue  of  fairness, 
a  matter  of  some  debate  in  the  technical  literature.  As  noted  above,  the 
UGES  discuss  the  issue  of  technical  feasibility  with  reference  to  sample 
size  and  comparability.  In  an  empirical  study  of  the  statistical  power 
associated  with  various  sample  sizes,  Schmidt,  Hunter  and  Urry  (1976)  con¬ 
cluded  : 

"This  study  demonst/uates  that  sample  sizes  fiequlned  to  produce 
adequate  pom a  In  empirical  validation  studies  axe  substantially 
laAgen.  than  has  typically  been  assumed.  This  finding  leads  to 
the  conclusion  that,  f,nom  the  viewpoint  ofi  sample-size  xequlne- 
ments,  cAltenlon-fielated  validity  studies  anz  "technically 
feasible"  much  less  faequentty  than  is  commonly  assumed  (p.  4 73)." 

Using  the  methodology  developed  by  Schmidt,  Hunter  and  Urry  (1976)  to  estimate 
the  sample  size  required  in  the  present  evaluation,  and  making  the  liberal 
assumptions  that  (1)  the  true  validity  of  the  FAST  test  is  .50,  (2)  the 
reliability  of  the  Initial  Entry  Rotary  Wing  (IERW)  overall  grade  is  .60  and 
(3)  70%  of  the  applicants  to  the  IERW  program  are  accepted,  128  subjects  per 
group  would  be  required  to  reach  a  power  of  .90  (i.e.,  to  have  a  90%  probability 
of  rejecting  the  null  hypothesis  if  it  is  indeed  false).  Thus,  from  the  stand¬ 
point  of  the  Schmidt,  Hunter  and  Urry  (1976)  article,  it  is  not  technically 
feasible  to  perform  a  fairness  evaluation  of  the  FAST  until  a  larger  sample 
of  IERW  graduates  is  available. 


An  earlier  section  of  this  research  report  noted  that  a  revised  version  of 
the  FAST  (the  RFAST)  is  presently  being  implemented  in  the  field.  The  version 
of  the  FAST  being  evaluated  for  fairness  in  this  report  has  two  different 
forms  developed  for  implementation  with  commissioned  officers  and  enlisted 
personnel  respectively.  Since  the  two  forms  differ  substantially  in  content 
and  number  of  items,  the  current  fairness  evaluation  must  be  conducted  sepa¬ 
rately  for  these  two  populations.  There  is  only  one  form  of  the  RFAST  which 
has  been  developed  for  use  with  both  populations.  Therefore,  future  fairness 
evaluations  will  not  require  separate  commissioned  and  enlisted  samples  which 
will  considerably  ameliorate  the  problem  of  collecting  samples  large  enough 
to  permit  a  conclusive  fairness  evaluation. 

One  key  issue  in  the  design  of  a  fairness  evaluation  study  is  the  choice 
of  a  statistical  model  to  guide  the  minority/majority  comparisons.  Section 
14B8  of  the  UGES  raises  the  point  that  the  concept  fairness  is  still  evolving 
in  the  literature.  Specifically,  the  choice  of  a  statistical  model  has  been 
debated  for  nearly  a  decade  since  the  publication  of  the  1970  version  of  the 
EEOC  Guidelines  (see  Cole,  1972;  Hunter  and  Schmidt,  1974;  Hunter,  Schmidt 
and  Rauschenberger ,  1977  and  Ledvinka,  1979).  The  current  literature  focuses 
on  four  models  which  lead  to  different  operational  definitions  of  fairness/ 
unfairness: 

1.  The  regression  model  (Cleary,  1968)  which  states  that  a  test  is  fair 
if  the  regression  lines  predicting  job  performance  are  the  same  (plus  or  minus 
sampling  variation)  for  minority  and  majority  groups. 

2.  The  conditional  probability  model  (Darlington,  1971;  Cole,  1973)  which 
states  that  a  test  is  fair  if  the  probability  of  being  selected  is  the  same 

for  minority  and  majority  group  members  who  are  actually  capable  of  satisfactory 
job  performance. 

3.  The  constant  ratio  model  (Thorndike,  1971)  which  states  that  a  test  is 
fair  if  its  selection  ratio  for  minority  and  majority  groups  is  the  same  as 
the  selection  ratio  using  a  perfectly  valid  test  (or  using  the  criterion 
measure  itself  for  selection) . 

4.  The  quota  model  which  states  that  a  test  is  fair  if  its  selection  ratio 
is  the  same  for  all  minority  and  majority  groups  regardless  of  group  performance 
on  the  job. 

While  various  authors  continue  to  argue  the  technical  and  ethnical  merits 
of  these  models,  it  has  been  pointed  out  by  Ledvinka  (1979,  p.  552)  and  by 
Hunter,  Schmidt  and  Rauschenberger  (1977,  p.  256)  that  the  UGES  clearly 
specify  the  regression  model  as  being  legally  appropriate  in  the  conduct  of 
fairness  research.  Two  UGES  passages  can  be  cited  to  document  this  point. 

"When  membeu  0f(  one  Kace,  &  ex  ok  ethnic  gKoup  chaKacteKUticatty 
obtain  loweK  6coKe&  on  a  6etection  pKoceduAe  than  membeM  o & 
anotheK  gKoup,  and  the  di^eKencu  in  6cokc&  okc  not  Kefilected 
in  di^eAenced  in  a  meaAuAe  o&  job  peA^oKmance  .  .  .  ."  [Section 


"lh  unhaiAness  is  demons  tAated  through  a  showing  that  membeAS 
oh  a  paAticulaA  gnoup  peAhoAm  betteA  oa  pooAeA  on  the  job,  then 
theiA  scoAes  on  the  6  ejection  pAoceduAe  Mould  Indicate  thkough 
comparison  u)ith  how  membeAS  oh  otheA  groups  peAhoAm,  the  user 
may  either  revise  oa  Aeplace  the  selection  instrument  in  accord¬ 
ance  with  these  guidelines,  oa  may  continue  to  use  the  selection 
instAument  opeAationally  with  appAopAiate  revisions  in  its  use 
to  assuAe  compatibility  between  the  pAobability  oh  success hul 
job  peAhoAmnce  and  the  pAobability  oh  being  selected .”  [Section 
24B8d) . 

There  is  an  additional,  independent  reason  to  use  the  regression  model 
in  this  fairness  evaluation.  Of  the  four  models,  it  alone  does  not  require 
a  "pass  through"  methodology  in  which  IERW  applicants  are  selected  for 
flight  training  regardless  of  their  FAST  scores.  While  a  pass  through 
methodology  is  technically  appropriate  in  fairness  research,  it  incurs 
a  substantial  increase  in  attrition  rate  over  the  use  of  an  efficacious 
selection  procedure.  Given  that  the  training  costs  in  the  IERW  program 
exceed  $125,000  per  trainee,  the  two  costs  of  a  pass  through  program,  higher 
attrition  costs  and  a  reduced  output  of  trainees,  could  conceivably  cost 
the  government  millions  of  dollars  per  year  and  lead  to  an  even  greater 
shortfall  in  aviators  in  the  field. 

METHOD 

The  subjects  that  comprise  the  minority/ female  samples  include  all  IERW 
program  trainees  who  identified  themselves  as  belonging  to  one  of  the  groups 
previously  Identified  in  the  UGES  (Black,  Hispanic,  Aslan,  American  Indian, 
female)  and  for  whom  both  FAST  and  IERW  overall  grade  (OAG)  data  were  avail¬ 
able  in  US  Army  Aviation  Center  (USAAVNC)  records.  The  data  collected  cover 
the  time  span  July  1975  to  July  1979. 

In  order  to  develop  the  regression  comparison  procedure  and  to  estimate 
the  fairness  of  the  FAST  as  a  predictor  of  performance  in  the  IERW  Program, 
a  sample  of  the  FAST  and  OAG  scores  for  majority  trainees  was  selected. 
During  the  same  time  period  that  scores  were  monitored  for  the  minority 
samples  described  in  this  report,  a  random  sample  of  10%  of  majority  offi¬ 
cers  and  10%  of  majority  WOCs  was  drawn  from  the  majority  population. 

The  sample  sizes  for  minority/ female  and  majority  commissioned  officers 
and  WOCs  are  presented  in  Table  1. 

The  Introduction  Section  of  this  paper  developed  the  concept  that  the 
evaluation  of  test  fairness  requires  the  comparison  of  minority/female  and 
majority  regression  lines.  A  statistical  technique  was  specifically  formu¬ 
lated  for  this  purpose  by  Gulliksen  and  Wilks  (1950).  Additionally,  there 
is  precedence  for  the  application  of  this  procedure  under  the  mandate  of  the 
UGES  (Reilly,  Zedeck,  and  Tenopyr,  1979).  The  Gulliksen  Wilks  technique, 
which  was  derived  from  Neyman-Pearson  likelihood  ratio  test  theory,  tests 
three  null  hypotheses  sequentially  (1950,  p.  96): 
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1.  HI  is  the  hypothesis  that  the  populations  from  which  the  samples 
were  drawn  have  equal  standard  errors  of  the  estimate  (around  the  least 
squares  regression  line). 

2.  H2  is  the  hypothesis  that  the  slopes  of  the  population  regression 
lines  are  the  same. 

3.  H3  is  the  hypothesis  that  the  Y-intercepts  of  the  regression  lines 
are  equal. 

In  applying  the  technique,  the  three  hypotheses  are  tested  sequentially 
starting  with  HI.  If  any  hypothesis  is  rejected,  hypothesis  testing  stops 
and  it  is  concluded  that  the  samples  were  drawn  from  different  bivariate 
populations.  If  all  three  null  hypotheses  are  retained,  then  the  samples 
have  the  same  bivariate  dispersion,  slope  and  intercept  and  thus,  coincident 
regression  lines. 

In  applying  the  Gulliksen  Wilks  technique  to  the  current  fairness  evalua¬ 
tion,  a  significant  problem  arises  because  of  the  small  sample  sizes  currently 
available  for  ethnic  and  female  IGRW  trainees.  Gulliksen  and  Wilks  state  that 
their  primary  purpose  is,  ".  .  .  to  present  large-sample  tests  for  the  hypoth¬ 
eses  considered  from  the  point  of  view  of  Neyman-Pearson  likelihood  ratio  test 
theory  (1950,  p.  94)."  The  smallest  sample  in  the  Reilly,  et.  al.  (1979) 
experiments  included  ^5  subjects.  A  conservative  statistician  would  prefer 
to  have  100  data  points  in  a  "large  sample"  bivariate  distribution.  How¬ 
ever,  it  is  clear  that  the  sample  sizes  in  the  current  research,  which 
range  from  a  high  of  22  Black  Officers  to  a  low  of  3  Oriental  Officers,  do 
not  meet  the  sample  size  requirement  for  the  Gulliksen  Wilks  procedure. 

A  search  of  the  statistics  literature  produced  a  regression  line  com¬ 
parison  procedure  which  was  derived  from  the  analysis  of  covariance  rather 
than  from  Neyman-Pearson  likelihood  ratio  theory.  Snedecor  and  Cochran 
(1967,  pp.  432-436)  present  a  procedure  which  tests  the  same  three  sequen¬ 
tial  hypotheses  discussed  by  Gulliksen  and  Wilks  (1950).  This  procedure, 
while  it  is  sensitive  to  the  usual  assumptions  made  by  parametric  statis¬ 
tics,  is  not  based  on  the  assumption  of  large  sample  sizes. 

*  RESULTS 

Table  1  presents  sample  sizes,  means,  and  standard  deviations  for  the 
Commissioned  Officer  and  WOC  samples.  In  addition,  the  correlation  of  the 
FAST  and  overall  grade  for  each  group  and  the  significance  of  that  correla¬ 
tion  coefficient  is  shown.  At  least  in  part  because  of  the  small  sample 
sizes  of  the  minority  and  female  samples,  only  2  of  the  10  correlations 
attained  significance.  In  both  of  the  majority  samples,  the  FAST  proved  tc 
be  a  significant  predictor  of  overall  grade  despite  the  restriction  in  range 
caused  by  the  prior  use  of  FAST  scores  as  a  selection  criterion  (Commissioned 
Officers  must  score  at  least  155  and  enlisted  or  civilian  entry  must  score  at 
least  3002  to  gain  admission  to  the  IERW  training  program).  In  reality,  the 
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Since  these  data  were  collected,  the  FAST  cutoff  score  for  WOCs  was  reduced 
to  270. 


restriction  of  range  problem  applies  only  to  the  WOC  samples  since  very  few 
of  the  Commissioned  Officer  applicants  score  below  155.  The  lesser  restric¬ 
tion  of  range  in  the  officer  sample  is  the  most  probable  explanation  for 
the  generally  higher  correlations  in  that  group,  as  contrasted  to  the  WOC 
samples . 

The  three  hypotheses  tested  in  the  fairness  evaluation  concern  the  equality 
of  the  standard  errors  of  the  estimate,  the  slopes,  and  the  Y-intercepts  for 
the  minority/female  and  majority  regression  lines.  The  logic  of  the  hypothesis 
test  procedure  requires  that  the  three  hypotheses  be  tested  sequentially.  That 
is,  the  hypothesis  of  equal  dispersion  about  the  common  regression  line  is 
tested  first.  If  that  F-ratio  reaches  significance,  the  hypothesis  test  pro¬ 
cedure  stops  and  it  is  concluded  that  the  two  samples  are  not  taken  from  the 
same  bivariate  population.  If  the  F-test  for  equality  of  variance  about  the 
common  regression  line  is  nonsignificant,  then  the  second  hypothesis  is  tested, 
i.e.,  the  two  slopes  are  compared.  Again,  if  the  F-ratio  reaches  significance, 
it  is  concluded  that  the  two  regression  lines  are  not  the  same.  If  the  F-ratio 
is  nonsignificant,  then  the  third  hypothesis  is  tested,  i.e.,  the  Y-intercepts 
(or  elevations)  of  the  two  regression  lines  are  compared.  Again,  if  the  F- 
ratio  reaches  significance,  it  is  concluded  that  the  two  samples  did  not  come 
from  the  same  bivariate  population.  Only  if  all  three  hypothesis  tests  yield 
nonsignificant  F-ratlos  can  it  be  concluded  that  the  two  regression  lines  are 
coincident . 

Given  the  very  small  population  sizes  available  at  the  time  this  research 
was  undertaken,  it  might  be  misleading  to  present  hypothesis  test  results. 

The  statistical  power,  even  in  the  largest  minority/majority  comparison,  is 
not  sufficiently  large  to  ensure  rejection  of  the  null  hypotheses  if  they 
are  Indeed  false.  Thus,  these  data  will  be  retained  and  the  fairness  analysis 
will  be  repeated  biannually  until  such  time  as  sufficient  data  are  available 
to  perform  a  conclusive  study. 


DISCUSSION 

As  noted  previously,  the  data  base  for  minority  and  female  IERW  trainees 
is  not  of  sufficient  size  to  permit  drawing  conclusions  regarding  the  fairness 
of  the  FAST  as  a  selection  device.  The  purpose  of  this  paper  is  to  develop 
the  rationale  and  methodology  for  such  a  fairness  evaluation.  Thus,  the 
current  discussion  will  focus  primarily  on  methodological  issues. 

In  accordance  with  the  UGES  the  fairness  of  a  selection  procedure  should 
be  determined  by  reference  to  the  regression  of  that  selection  test  (or  pro¬ 
cedure)  on  job  referenced  criteria.  Section  14B(3)  of  the  UGES  notes  that 
training  performance  is  an  acceptable  criterion  under  certain  conditions: 

"WheAe  peAfoAmance  i n  tAaining  iA  uAed  oa  a  cnitenion,  aucccaa 
in  tAaining  Ahould  be  pAopeAiy  meaAuAed  and  the  Aelevance  of, 
the  tAaining  should  be  Ahown  eitheA  thAough  a  compaAiAon  of 
the  content  of  the  tAaining  pAognam  with  the  cAitical  oa  impoA- 
tant  woAk  behavioA(A)  of  the  job [a),  oa  thAough  a  demonAtAation 
of  the  AelationAhip  between  meaAuAeA  of  peAfoAmance  in  tAaining 
and  meaAuAeA  of  job  peAfoAmance.  MeaAuAeA  of  AeZative  aucccaa 
in  tAaining  include  but  a) te  not  limited  to  inAtAuctoA  evaluationA , 
peAfoAmance  AampleA ,  oa  teAtA." 


The  IERW  training  program  clearly  meets  the  conditions  specified  in  14B (3) 
by  virtue  of  the  content  of  the  training  program  and  the  measures  of  relative 
success  employed  as  grading  procedures.  The  curriculum  of  the  IERW  Program 
of  Instruction  (POI)  has  been  developed  specifically  to  train  aviators  to 
perform  Army  aviation  missions  in  the  field.  Thus,  the  content  of  the  training 
program  corresponds  very  closely  to  the  critical  work  behaviors  performed  on 
the  job.  Training  grades  are  composed  of  the  three  components  identified  in 
the  UGES:  Instructor  evaluations  (Instructor  Pilot  put-up  scores),  performance 
sampler  (checkrides) ,  and  tests  (academic  examinations) .  The  IERW  overall 
grade  which  is  used  as  a  criterion  in  this  research  is  a  composite  of  all  three 
evaluation  components.  In  summary,  the  design  of  the  current  fairness  evalua¬ 
tion  is  in  accordance  with  the  directives  of  the  UGES. 

While  the  sample  sizes  for  the  minority/ female  groups  presented  in  Table  1 
are  too  small  to  justify  the  drawing  of  inferences  to  the  entire  populations 
of  female  and  minority  aspirant  aviators,  several  points  warrant  discussion. 

For  both  Hispanic  samples  (Officer  and  WOC) ,  the  FAST  has  a  nonsignificant 
negative  correlation  with  overall  grade.  Inspection  of  the  scatter  diagrams 
in  both  cases  reveals  that,  while  the  general  linear  trend  is  positive  for 
the  entire  sample,  two  or  three  outliers  with  extreme  scores  unduly  influenced 
the  regression  line.  For  example,  in  the  Commissioned  Officer  sample,  the 
individual  with  the  highest  IERW  overall  grade,  89.35,  has  an  unusually  low 
FAST  score,  197.  Expressed  as  standard  scores,  this  individual's  overall 
grade  is  z  ■  1.44  whereas  his  FAST  is  z  ■  -1.12.  Conversely,  the  individual 
with  the  lowest  overall  grade,  79.39,  has  a  moderately  high  FAST  score,  313. 
Expressed  as  standard  scores,  overall  grade  z  ■  -2.71  and  FAST  z  ■  .83.  If 
these  two  individuals  are  removed  from  the  distribution,  the  correlation  for 
the  remaining  12  individuals  is  .193.  The  sensitivity  of  this  correlation 
coefficient  to  only  two  data  points  demonstrates  the  inappropriateness  of 
generalizing  from  the  small  minority  and  female  samples  in  the  current  study. 

The  purpose  of  this  research  effort  is  to  establish  an  appropriate  meth¬ 
odology  to  evaluate  the  FAST  for  fairness.  The  methodology  reviewed  in  this 
paper  has  been  programmed  for  automated  computation  on  a  computer.  Additionally, 
a  mechanism  has  been  established  to  collect  data  on  minority/f emale  and  majority 
IERW  trainees.  As  more  minority/ female  trainees  complete  pilot  training,  the 
fairness  evaluation  will  be  iteratively  performed  until  sample  sizes  permit 
sufficient  statistical  power  to  draw  conclusions  about  the  fairness  of  the  FAST. 
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